arXiv: 1502.05119v2 [quant-ph] 22 Feb 2015 


Non-exponential Fidelity Decay in Randomized Benchmarking with Low-Frequency Noise 


M. A. Fogarty, 1 M. Veldhorst, 1 R. Harper, 2 C. H. Yang, 1 S. D. Bartlett, 2 S. T. Flammia, 2 and A. S. Dzurak 1 

1 Centre for Quantum Computation and Communication Technology, 

School of Electrical Engineering and Telecommunications, 

The University of New South Wales, Sydney, NSW 2052, Australia. 

2 Centre for Engineered Quantum Systems, School of Physics, 

The University of Sydney, Sydney, NSW 2006, Australia 
(Dated: February 24, 2015) 

We show that non-exponential fidelity decays in randomized benchmarking experiments on quantum dot 
qubits are consistent with numerical simulations that incorporate low-frequency noise. By expanding standard 
randomized benchmarking analysis to this experimental regime, we find that such non-exponential decays are 
better modeled by multiple exponential decay rates, leading to an instantaneous control fidelity for isotopically- 
purified-silicon MOS quantum dot qubits which can be as high as 99.9% when low-frequency noise conditions 
and system calibrations are favorable. These advances in qubit characterization and validation methods underpin 
the considerable prospects for silicon as a qubit platform for fault-tolerant quantum computation. 


Randomized benchmarking experiments Hi quantify the 
accuracy of quantum gates by estimating the average decay 
in control fidelity as a function of the number of operations 
applied to a qubit. Benchmarking enjoys several advantages 
over the traditional methods of characterizing gate fidelity that 
involve quantum process tomography 0], namely that it 
is insensitive to state preparation and measurement (SPAM) 
errors, and scales efficiently with the system size. As such, 
benchmarking protocols (see Figure Q} have become a stan¬ 
dard against which different qubit technologies and architec¬ 
tures are compared. Benchmarking experiments have been 
performed in many different technologies, including trapped 
ions [SSi , superconducting qubits |7|-§{l, nuclear magnetic 
resonance architectures id, nitrogen-vacancy centers in dia¬ 
mond | Ill], semiconductor quantum dots in silicon [12], and 
phosphorous atoms in silicon fl3| . Most experiments are fit¬ 
ted using an exponential fidelity decay, which is in line with 
original theoretical predictions [l],[l4|], and consistent with the 
assumption of weak correlation between noise on the gates 
that is important for accurate fidelity estimates. 

When the assumptions of randomized benchmarking are vi¬ 
olated, there is no guarantee of observing the characteristic 
exponential decay curves determined by the average fidelity. 
This has been noted before in NMR experiments due to spatial 
inhomogeneity across the sample liToll as well as in numerical 
simulations lfl5il of 1// noise. Recent experimental results 
in spin-based silicon metal-oxide-semiconductor (Si-MOS) 
quantum dot qubits |[l2!] have also shown non-exponential fi¬ 
delity decay, and here we directly apply our theoretical mod¬ 
elling to these experiments, but our conclusions are widely 
applicable. 

Here we argue that non-exponential fidelity decay is in¬ 
deed indicative of a dephasing-limited decay caused by non- 
Markovian noise. We first propose a numerical simulation 
method that incorporates time-dependent effects, primarily a 
drift in frequency detuning. This detuning drift and other 
time-dependent low-frequency noise sources lead to decay 
curves that are effectively integrated over an ensemble of 
experimental results, each with slightly different “instanta¬ 
neous” average fidelities, i.e., fidelities that are approximately 
stable over the course of a single benchmarking run, but that 



FIG. 1. a) Randomized benchmarking consists of applying multi¬ 
ple sequences of random Clifford gates, a final recovery Clifford to 
ensure that each sequence ends with the qubit in an eigenstate, and 
reading out the qubit state. In interleaved randomized benchmark¬ 
ing, an additional test-Clifford gate is inserted in between the ran¬ 
dom Cliffords, b) Bloch sphere representation for the breakdown of 
a general noisy operation Cn into an ideal Ci rotation followed by a 
noise operation D. 


drift over the course of the entire sequence of experiments. 
These simulations show good qualitative agreement with the 
observed non-exponential decay from the ex peri ments on 
isotopically-purified silicon quantum dot qubits 1121] . We then 
give a more quantitative analysis that compares two very sim¬ 
ple models that both give good fits to the data: the first is a 
simplified version of the drift model that postulates that each 
experimental run has one of only two possible average fideli¬ 
ties; the second model attributes the non-exponential decay to 
fluctuating SPAM errors. Both of these models have only one 
additional parameter over the standard benchmarking model, 
but our quantitative likelihood analysis shows that the simpli¬ 
fied drift model is much more probable. 

The conclusion of this analysis for the SiMOS quantum-dot 
qubit is that, while the total average fidelity over a long series 
of benchmarking runs is 99.6% IU21, the instantaneous fidelity 
can be as high as 99.9% or more when naturally fluctuating 
environmental noise sources and system calibrations are most 
favorable. Achieving such high fidelities for single-qubit gate 
operations gives optimism for exceeding the demanding error 
thresholds for fault-tolerant quantum computation. 
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I. BENCHMARKING REVIEW 

The standard randomized benchmarking procedure in¬ 
volves subjecting a quantum system to long sequences of ran¬ 
domly sampled Clifford gates followed by an inversion step 
and a measurement, as depicted in Figure Q] The unitary op¬ 
erations of the Clifford group G are those that map the set of 
Pauli operators to itself under conjugation. They are a dis¬ 
crete set of gates that exactly reproduce the uniform average 
gate fidelity, averaged over the set of all input pure states [ 16j]. 
An alternate version known as interleaved benchmarking |[8|] 
inserts a systematic application of a given gate, such as the 
H gate shown in Figure |T] The difference from the reference 
sequence gives information about the specific average gate fi¬ 
delity of the given gate, rather than the average fidelity addi¬ 
tionally averaged over the ensemble of gates. 

Consider a general noise process T>, depicted in Figure |T] 
which represents the deviation of a noisy Clifford gate Cn 
from an ideal unitary Clifford operation Ci : 

Cn = 'DCi. 

We note that the above equation uses the formalism of com¬ 
pletely positive maps O, and the multiplication corresponds 
to composition of maps. The standard approach to random¬ 
ized benchmarking makes the assumption that V does not de¬ 
pend on the choice of Cj or other details such as time, but our 
simulations and of course real experiments will include such 
a dependence. 

The fundamental result of randomized benchmarking |[^] is 
that for sufficiently well-behaved noise the observed fidelities 
only depend on the average error operation £x> averaged over 
the Clifford group G given by 

£v = |gT C i VC i 1 1 
1 1 Cl gg 

as well as any SPAM errors present in the system. Further¬ 
more, standard tools from representation theory reduce this 
average error operation to one that is nearly independent of 
T>, and is characterized by just a single parameter p. In partic¬ 
ular, it is a depolarizing channel £ with p = p(T>) being the 
polarization parameter (i.e., the probability of the information 
remaining uncorrupted as it passes through the channel). For a 
d-dimensional quantum system, the action of the depolarizing 
channel is given by 

£(p) = PP + (1 ~P)\i 

and the polarization parameter is related to the noisy deviation 
V by the average gate fidelity J^ vg (2?) according to H 

•Favg(iP) = J dlp(lp\V(\lp)(lp\)\ij;}=p+l—-P, (1) 

where the integral is a uniform average over all pure states. 

For a randomized benchmarking sequence comprised of 
m + 1 total Clifford gates (including the +1 for the recovery 
operation), the average sequence fidelity is given by [j2j] 

F m = Ap m + B . (2) 


Here the parameters A and B quantify the SPAM errors and 
are given by iH 

A = Tr[EV(p-t/d)\ , B = Tr[EV(t/d)] , 

and p and E are the noisy state preparations and measure¬ 
ments implemented instead of the ideal desired states and 
measurements. 

A typical benchmarking experiment proceeds by estimating 
F m for several values of m and fitting to the model in Eq.[2]to 
extract the p, A, and B fit parameters, and then using Eq.[[]to 
report an ensemble average of the average gate fidelities E avg 
of the gates. 

This derivation of Eq. [2] assumes certain features about the 
noise, namely that it has negligible time and gate dependence, 
and that non-Markovian effects are not present at timescales 
on the order of the gate time. The limits to the validity of 
this assumption have been probed before fl5Hl8lll9ll . and in 
particular it was noted via numerical simulations by Epstein et 
al. s that the exponential model of fidelity decay no longer 
holds in the presence of 1// noise, resulting in a noise floor 
to the accuracy of the benchmarking experiment. 


H. NON-EXPONENTIAL FIDELITY DECAY 

A clear deviation from the fidelity decay predicted by Eq.[2] 
has been observed in a silicon quantum dot qubit lfl2l l. In or¬ 
der to understand the possible origin of this deviation, we have 
used the qubit characteristics to numerically simulate random¬ 
ized benchmarking with a realistic noise model. In the experi¬ 
ment, the qubit is defined by the spin state of a single electron. 
A magnetic field Bq = 1.4 T is applied to create a Zeeman 
splitting and the qubit is operated using electron spin reso¬ 
nance (ESR) techniques by applying an AC magnetic field 
with frequency wo = 0 . A Rabi 7r-pulse is realized in 

r op = 1.6 ps and using a Ramsey sequence the dephasing time 
T 2 * = 120 ps has been obtained jl2|. In between consecutive 
pulses, a waiting time t w = 0.5 ps has to be incorporated, due 
to the operation of the analog microwave source. 

The set of Clifford gates is generated using the set 
[±2f, ±Y, ±iy] that are realized using Rabi pulses, 

and the identity simulated with a waiting time equal to a ir- 
pulse. The noise processes which determine T| can be mod¬ 
elled as a random walk of the detuning Aw away from the 
ideal operation frequency wo, over timescales greater than a 
single run of a random Clifford sequence. In order to simu¬ 
late an ensemble of results, the Aw term is selected randomly 
from a Gaussian distribution of normalized variance: 


° P 2tt x /2 1 ii(2 )T* ' 

Using this distribution, we have numerically simulated bench¬ 
marking experiments and the results are shown in Fig. [2] The 
individual traces correspond to a given detuning Aw and re¬ 
sult in the “instantaneous” fidelity of the qubit. While the 
individual traces are decaying exponentially, the averaged fi¬ 
delity (bold blue) obtained from the Gaussian ensemble is 
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clearly non-exponential. We have also included the case of a 
Lorentzian distribution of detunings (red), resulting in a non¬ 
exponential decay as well. In the simulation, the only error 
source is dephasing, whereas in the experiment, other errors 
might be present such as pulse-errors. Inclusion of such er¬ 
rors will still result in non-exponential decays, provided de¬ 
phasing is a significant source of error. We note that in the 
experiment, Ramsey sequences have been performed in be¬ 
tween sequences to recalibrate the resonance frequency of the 
qubit and to compensate drifts due to, for example, the super¬ 
conducting magnet. These drifts, in combination with errors 
in setting the resonance frequency, result in an apparent in 
the randomized benchmarking experiment that is dependant 
upon the duration of the data acquisition, causing a faster de¬ 
cay of the ensemble averaged sequence fidelity. 

As low-frequency drift of the qubit resonance frequency 
can lead to non-exponential fidelity decay, we hypothesize 
that some ensemble of experiments with varying decay rates 
is the correct explanation for the non-exponential behaviour 
of the experimental benchmarking data 11211 . To support this 
hypothesis, we use the Akaike information criterion to show 
that a simple model allowing for differing fidelity rates better 
explains the data than an alternative explanation that assumes 
fluctuating SPAM errors in the standard (zeroth order) model. 


A. Eliminating the constant for a single-qubit randomized 
benchmarking model 

The parameters A and B in Eq. [2] are nuisance parame¬ 
ters that do not convey information about the desired fidelity. 
Eliminating one of these parameters, in this case B, will fur¬ 
ther constrain the zero order model and allows deviations to 
be more clearly identifiable. A further advantage of removing 
the parameter B is to allow fitting of a linear function on a log- 
linear plot. In Ref. in the randomized benchmarking proto¬ 
col was modified to eliminate B from the zero order model. 
We first provide a theoretical justification for this approach, 
which we note applies only to qubits (d = 2), and demonstrate 
that the resulting data highlight the deviation of the measured 
data from the expected exponential decay model. 

Recall that the zero-order model fits the average fidelity of 
a gate sequence to a simple formula as follows yj]: 

Fi = AV + fit, (3) 

where the qubit is initialized as |t)(t|i the final gate in the 
random benchmarking sequence is chosen to return the state 
to |tXt|, and Fl is the survival probability of this state. To 
eliminate the constant from this sequence it is only nec¬ 
essary to perform similar randomized sequences, save that the 
final (m + 1)* gate is set to change the state to For 

these runs, we consider the survival probability for yielding 
the measurement outcome EA, where in the ideal case the fi¬ 
nal state p = = I4-X4-I- This is the survival probability for 

each run Fi. Under the same assumptions we have 



FIG. 2. a) Sequence fidelity as a function of sequence length m, with 
the qubit subject to Gaussian distributed T-j associated noise. Each 
light blue line represents a fidelity decay for one particular value of 
detuning Acj. The linear decay on the logarithmic scale illustrates 
that these individual traces are indeed exponential while the ensem¬ 
ble average (bold blue) is non-exponential. The bold red line is the 
ensemble average for a Lorentzian distributed noise, b) Gaussian 
distributed detuning frequencies and c) Lorentzian distributed detun¬ 
ing frequencies associated with individual traces. 


Combining these two equations by defining F m = Fi — 
(1 - Fi), we have: 

F m = Ap m + (fit + B±)- l, (5) 

where A = A^ + A^. 

Recall that B A = Ti\_EAV{t/d )\, where V is the average 
noise operator. For the Fi runs, the derivation is identical, 
apart from the final change to the |4-)(4-| state, so we have = 
Tr[E^T>(t/d)\ . Noting that EA + E^ = 1 for qubits ( d = 2) 
and that V is trace-preserving and assuming the error on the 
final X gate is negligible, B^ can be re-expressed as follows: 

B l = Tr[E l T>(±/2)] = Tr[(l - E 1 ')T>( 1/2)] 

= Tr[P(l/2)] - Tr[E t I?(l/2)] = 1 - . ( 6 ) 

Therefore by subtracting the average results of the data-set 
(1 — Fi) from the average results of the data-set Fi we can 
obtain a data set that is distributed according to the model 

Fm = Ap m (7) 


(4) under the standard benchmarking assumptions on the noise. 


Fi = A l p m + B l . 
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Sequence length, m 

FIG. 3. Semi-log plot of F .— (1 — F .for the reference sequence 
of randomized benchmarking on a silicon quantum dot qubit m. 
Both the two fidelity model and a single fidelity model including 
residual SPAM can fit the data, but for the single fidelity model an 
unreasonably large SPAM has to be included. 

The data from Ref. ( 12] consist of 8 data sets (one reference 
set and 7 interleaved sets). For each data set, gate sequences 
of the lengths (2, 3, 5, 8, 13, 21, 30, 40, 50, 70, 100, 150} 
were measured, where for each gate sequence length (m) the 
randomized protocol was carried out 500 times and randomly 
distributed over Fj n and Ff . This amount of randomization 
is at least an order of mag nitude more than in previous ex¬ 
periments jlL I5L l7l- [lll fl3h . Each randomized protocol was 
performed 50 times in order to estimate the survival probabil¬ 
ity. 

B. Analysis of experimental data 

To quantify the quality of our fits, we require estimates of 
the variance in the data of Ref. d. The observed variance 
of the data matched to within 5-40% of the theoretical upper 
bounds derived in Ref. [ 18] when the gate length was shorter 
than 20 (so that the m{ 1 — ,F aV g) "C 1 assumption discussed in 
that reference was satisfied). Accordingly, the observed exper¬ 
imental variance was used as a reliable estimate of the actual 
variance of the distribution. It should be noted that the ob¬ 
served variance actually decreased for gate lengths of 100 or 
greater. One explanation for this unexpected behaviour is that 
some of the sequences become saturated to something close 
to a completely mixed state before reaching those sequence 
lengths. 

Figure [3 shows the data from the reference dataset plotted 
on a semi-log plot. The confidence bounds are 95% and the 
data is clearly non-linear (i.e. the decay is not a simple ex¬ 
ponential). Similar deviation from the linear fit was noted in 
each of the data sets, with the best-fit linear model consistently 
underestimating F m for to > 100. 

Two possible explanations are considered. First, it may not 
be possible to entirely eliminate the constant term ( B ) due to 


Dataset 

Akaike Information Criteria 

+ B Ap m + Aq m 

Comparison 

Ref 

-16.93 

-25.29 

65.44 

I 

-46.19 

-57.12 

238.10 

X 

-54.52 

-59.99 

15.43 

X/2 

-62.89 

-63.79 

1.56 

-X/2 

-57.77 

-64.34 

26.69 

Y 

-36.06 

-50.43 

1317 

Y/2 

-36.04 

-46.39 

172.0 

-Y/2 

-46.37 

-63.32 

4815 


TABLE I. Akaike information criterion for standard and interleaved 
randomized benchmarking. The comparison column specifies how 
many times as probable is the Ap m + Aq m model to minimize in¬ 
formation loss as compared to the Ap m + B model. 


a violation of one of the assumptions in the above derivation. 
A second explanation is that low-frequency noise leads to de¬ 
tuning, and hence time-dependent errors on the gates in some 
of the experiments. The first, which we denote the residual 
SPAM model , can be modelled by reverting to a formula of 
the form F rn = Ap m + B, where now B represents residual 
SPAM errors that were not eliminated under the assumptions 
that led to the derivation of Eq. [7] We consider the simplest 
possible model for the second explanation - the two fidelity 
model - by fitting the fidelity decay to a formula of the form 
F m = Ap m + Aq m . This represents an attempt to model 
the data by simplifying the ensemble of experiments by re¬ 
ducing them to just two different equally weighted sequence 
behaviours: one with a high-fidelity rate (related by the usual 
measure to p) and one with a lower fidelity rate (similarly re¬ 
lated to q). This model has fewer parameters than the Gaus¬ 
sian or Lorentzian drift models, and is much easier to fit. In 
this interpretation, we have successfully eliminated the B pa¬ 
rameter as per Eq. [7] but time variation gives us the two dif¬ 
ferent polarization parameters p and q, with the decay rate for 
each sequence sampled randomly with equal probability. As 
can be seen in Figure [3] both models fit the data substantially 
better than the simple exponential of the zero order model. 

Although the residual SPAM model produces a good fit to 
the experimental data, it does so with the equivalent of an 
unusually large SPAM parameter B of around 0.14, corre¬ 
sponding to a if of 0.57. This represents in the theoretical 
model a very large bias in the expectation value of the spin-up 
measurement on the completely mixed state away from 0.5, 
which is not observed in the experiment. This suggests that 
this model may not be best for explaining the observed data. 

To compare these two models quantitatively, it is possi¬ 
ble to calculate the log likelihood and Akaike information 
criterion 12ll for the two models. Because we don’t have the 
actual distribution of the test statistic, we make the assump¬ 
tion that the samples contained in the underlying data are in¬ 
dependent and the Gaussian distributed limit is appropriate. 
This assumption is well-justified as we have a large number of 
independent data sets. The distribution F m can therefore be 
approximated by a Gaussian distribution with a variance esti- 
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Sequence length, m 



FIG. 4. (a) Reference sequence of randomized benchmarking on a 
silicon quantum dot qubit 1 12:1. The separate fidelities from the two 
fidelity model have been plotted to show how the initial decay is 
dominated by the low q value, whereas the higher value of p is in¬ 
dicative of the average decay in the longer lived high fidelity regime. 
Histogram of spin-up | j") and spin-down |4-) corresponding with data 
point m = 2 (b) and m = 150 (c). Results with expected spin up 
outcome are shown in red while blue represents data with expected 
spin down result. The grey regions illustrate the overlapping areas. 


mated by the observed variance at each gate length. The log 
likelihood of the observed data, given each of the two models, 
can then be calculated using standard methods. 

Table Q] shows the calculated Akaike information criterion 
for each of the experimental datasets. As can be seen, the two 
fidelity model better explains the data, significantly so on all 
but one of the datasets. Although such a model is a simplified 
version of the drift model, the fact that it fits the data well and 
is physically motivated supports its adoption as the most likely 
explanation of the non-exponential curve seen in the data. 


C. Interpreting the two fidelity model 


Dataset 

P 

9 

T-P/P 
^ avg 

avg 

T-9/P 
^ avg 

Uncertainty 

Ref 

0.995 

0.959 

99.9% 

98.9% 

- 

0.06% 

I 

0.993 

0.946 

99.9% 

99.6% 

98.7% 

0.3% 

X 

0.993 

0.952 

99.9% 

99.8% 

98.9% 

0.2% 

X/2 

0.993 

0.947 

99.9% 

99.7% 

98.7% 

0.2% 

-X/2 

0.991 

0.947 

99.9% 

99.7% 

98.7% 

0.2% 

Y 

0.993 

0.964 

99.9% 

99.9% 

99.1% 

0.3% 

Y/2 

0.991 

0.952 

99.9% 

99.8% 

98.9% 

0.2% 

-Y/2 

0.990 

0.911 

99.9% 

98.7% 

97.8% 

0.2% 


TABLE II. Calculated p and q values for the two fidelity model. The 
gate fidelity estimates (.Favg) reported for the reference run are the 
high (p) gate fidelity estimate and low (q) gate fidelity respectively. 
For the interleaved models, three comparative estimates are reported, 
the first is calculated by comparing the higher fidelity of the in¬ 
terleaved run with the higher fidelity from the reference run, the sec¬ 
ond J-%!g by comparing the lower fidelity of the interleaved run with 
the lower fidelity of the reference run, and the third Pllg (which rep¬ 
resents the worst possible method of calculating this) compares the 
low interleaved run with the higher reference run. The error for the 
reference set is calculated directly from the data fit. For the inter¬ 
leaved runs the formulas provided in Ref. @] were used to determine 
the likely error margins. 


nection more carefully. In particular, it would be interesting to 
give a direct connection to a more general drift model, since 
these are easier to interpret physically, but much harder to fit 
and analyze statistically. 

By considering the non-exponential decay manifesting as 
the average over an ensemble of results, the fidelity can be 
considered to be operating under two regimes as depicted in 
Figure |4jt. Firstly, dominating the observed fidelity decay at 
low to, there is rapid and short-lived decay due to traces of 
large detuning Aw. Secondly, for large to, there is an approx¬ 
imately exponential tail due to long-lived traces of smaller de¬ 
tuning. The second regime can be approximated as exponen¬ 
tial due to the larger detuning frequencies becoming negligi¬ 
ble as to increases. In Figure0}i, each of the data points are an 
average over 25,000 experimental repetitions as presented by 
the two accompanying histograms (Figure |4j) for to = 2 and 
Figure [4j for m = 150). Each histogram separately shows 
the measured probability, averaged over 50 repetitions, for the 
spin-up and spin-down observables as expected at the end of 
a noiseless version of the applied random sequence. From the 
second regime of Figure[4ji we can see that many experiments 
within the ensemble of measurements have an instantaneous 
fidelity at least as high as 99.9%. 


Since the two fidelity model is the quantitatively preferred 
model, a natural question arises: how should we interpret the 
model parameters? The obvious interpretation of the two pa¬ 
rameters p and q is as presented in table |ITJ that their differ¬ 
ence represents the characteristic spread of the actual underly¬ 
ing ensemble of fidelities from which the benchmarking data 
are sampled. Such an interpretation is natural and compelling, 
however it remains an open problem to quantify such a con- 


III. CONCLUSIONS 

We have analyzed the non-exponential decay in randomized 
benchmarking experiments on Si-MOS quantum dot qubits, 
and found that the most plausible explanation of this decay 
is drift in detuning frequencies. Our simulation of temporal 
integration over a spectrum of time-dependent detuning fre- 
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quencies qualitatively reproduces the observed fidelity decay 
of previously conducted experiments fH- In addition, we 
have quantitatively ruled out a competing model by showing 
agreement of a simplified ensemble (the two fidelities model) 
that is much more probable. This yields confidence that de¬ 
tuning drift is the correct explanation for the origin of such a 
non-exponential fidelity decay. 

Fitting the randomized benchmarking data with a two- 
fidelity model demonstrates that silicon MOS quantum dot 
qubits already exhibit an “instantaneous” control fidelity of 
99.9%. We anticipate that this value is the relevant fidelity 
for quantifying the achievable performance of these gates for 
quantum computation, since improvements in the readout fi¬ 
delity and use of a fast Ramsey protocol to calibrate the res¬ 
onance frequency for each experiment (22] could result in an 
ensemble fidelity that matches the best instantaneous fidelity 
which is ultimately defined by fixed errors. 

These results raise several intriguing questions. The first 
is to quantitatively link the simple and easy to analyze two 
fidelity model to the Gaussian or Lorentzian drift models. Al¬ 
ternatively, directly fitting a drift ensemble to the data would 
give a better picture of the source of the non-exponential fi¬ 
delity decay, but this approach risks overfitting, and is already 
difficult for the simple case of Gaussian-distributed detunings. 

Finally, there is at least one other natural competing expla¬ 


nation for the non-exponential decay. It might be the case that 
long benchmarking sequences saturate the exponential decay 
rates and have slower decay on very long timescales. If this 
were the case, then fitting to sequences that were “too long” 
would certainly bias one toward seeing non-exponential de¬ 
cay and reporting fidelities that were higher than warranted 
by the analysis. Therefore, deriving stopping criteria for the 
maximum sequence length and deriving tests that rule out this 
alternate explanation is a further important open question for 
future work. 
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